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Abstract — The theory of zero-error communication is 
re-examined in the broader setting of using one classical 
channel to simulate another exactly in the presence of 
various classes of non-signalling correlations between 
sender and receiver i.e. shared randomness, shared 
entanglement and arbitrary non-signalling correlations. 
When the channel being simulated is noiseless, this 
is zero-error coding assisted by correlations. When 
the resource channel is noiseless, it is the reverse 
problem of simulating a noisy channel exactly by a 
noiseless one, assisted by correlations. In both cases, 
separations between the power of the different classes 
of assisting correlations are exhibited for finite block 
lengths. The most striking result here is that entan- 
glement can assist in zero-error communication. In 
the large block length limit, shared randomness is 
shown to be just as powerful as arbitrary non-signalling 
correlations for exact simulation, but not for asymptotic 
zero-error coding. For assistance by arbitrary non- 
signalling correlations, linear programming formulas 
for the asymptotic capacity and simulation rates are 
derived, the former being equal (for channels with non- 
zero unassisted capacity) to the feedback-assisted zero- 
error capacity derived by Shannon. Finally, a kind of 
reversibility between non-signalling-assisted zero-error 
capacity and exact simulation is observed, mirroring 
the usual reverse Shannon theorem. 



I. Introduction 

Much of classical and quantum information theory 
is concerned with the use of one resource (a channel, 
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an entangled state, etc.) to simulate another Typically 
errors are allowed in the simulation protocol if they 
vanish asymptotically as the number of resources 
involved grows. One then asks for the asymptotic 
rates of resource exchange: Shannon's channel coding 
theorem [2| tells us the asymptotic rate at which 
we need to make use of a given discrete memory- 
less channel to simulate a perfect bit channel. The 
quantum reverse Shannon theorem [5| shows that a 
single number associated to quantum channels, the 
entanglement assisted classical capacity Ce, deter- 
mines the rate at which it can simulate another when 
entanglement is a free resource. Since Ce reduces 
to the Shannon capacity for classical channels, the 
availability of entanglement does not affect the rate 
at which one classical channel can simulate another, 
in the setting where errors which vanish in the large 
block length limit are tolerated. 

Since it is often unrealistic to assume that arbitrar- 
ily long block lengths can be used in encoding and 
decoding, an alternative, idealised, task of zero-error 
coding [181 has been considered since the seminal 
1956 paper of Shannon [IJ and more recently in 
quantum information theory Q, E), S, ifTOl . 

For a suitable definition of decoding error probabil- 
ity pe, both asymptotic and zero-error coding theory 
make statements about the region of triples (71, k,pe) 
which can be achieved by codes which use n channel 
uses to transmit k bits (or, equivalently, one of 2*^ 
symbols). The full characterisation of this achievable 
region is normally far from tractable. Whereas the 
freedom granted by demanding only that ^ 
as n — > 00 admits simplification via random coding 
arguments (for example) in the asymptotic theory, the 
zero-error theory (which studies the restriction of the 
region to the plane p^ = 0) is attractive because the 
problem becomes essentially combinatorial. Never- 
theless, it is a source of hard mathematical problems; 
In Shannon's groundbreaking work on the subject IT] 
he made a conjecture (on the zero-error capacity of 
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the pentagon channel) which had to wait over twenty 
years before it was proven by Lovasz |19|. Many 
related open problems remain |18|. 

In this paper we consider both zero-error coding 
and the "reverse" problem of exact simulation of 
noisy channels when various types of correlations 
between sender and receiver are freely available. 
This leads to various relaxations of the combinatorial 
problems posed by the unassisted theory, some of 
which have complete and general solutions. 

II. Overview 

This section introduces the central concepts and 
quantities dealt with in the rest of the paper (please 
note that an index of notations is provided as an 
appendix). C{X — > Y) denotes the set of discrete, 
memoryless, classical channels (i.e. conditional prob- 
ability distributions) with inputs in X and outputs in 
Y, X and Y being finite sets. C{A ^ S,B T) 
means the set of bipartite conditional probability 
distributions, with inputs in the set A and outputs 
in S for Alice, and inputs in B and outputs in 
T for Bob. We will frequently consider bipartite 
distributions that are non-signalling, which we will 
refer to as correlations. A class Q, of correlations is a 
subset of all possible bipartite conditional probability 
distributions defined by some property such that the 
set is closed under local operations by either party, in 
additon to all distributions in being non-signalling. 
We denote the subset of C{A^ S.B-^T) which is 
in the class VL by n{A^S,B-^T). 

Here we deal with the following classes of cor- 
relations: A bipartite channel is in NC if it can 
be implemented by local operations alone — there 
are No Correlations between the two parties at all. 
Correlations belong to SR if they can be obtained 
using (classical) Shared Randomness (and local op- 
erations); to SE (Shared Entanglement) if they can be 
obtained from local operations on a shared quantum 
state; and to NS if the correlation is Non-Signalling 
in both directions: That is, the marginal distribution 
of Alice's output is independent of Bob's input and 
vice versa. Each class in this list has a strictly 
weaker defining property than the last, so we have 
NC(A ^ S,B ^ T) c SR(A ^ S,B ^ T) c 
SE(A^S',B^T) c m{A^S,B-^T). 

If Alice and Bob are connected by a classical 
channel M ^ C{X~^Y) and have access to any cor- 
relation in class 57 (shared randomness, entanglement 



etc.) then they can exactly simulate M £ C(Q— >i?) 
if there is a local protocol whereby Alice takes an 
input q E Q and, through local operations and a 
single use of J\f and any use of 57, Bob produces an 
output r E R, such that the conditional probability 
of r given q is exactly J\4{r\q). To say that n uses 
of J\f can exactly simulate m uses of Ai means that 
A/"**" can exactly simulate M'^"^. 

On pairs consisting of a bipartite correlation P E 
57(Q — ^ X,Y — > R) and a classical channel J\f E 
C{X — > Y) we define a bilinear map W which 
corresponds to 'wiring' Alice's output of P to the 
input of and the output of Af to Bob's input to P 
to produce a new classical channel Ai = W[P,Af], 
with 

Mir\q):= ^ P{x,r\q,y)Af{y\x). 

Because of the time ordering involved, this only 
makes operational sense if 57 is non-signalling from 
Bob to Alice, and if it is not then Ai may not 
be a valid conditional distribution. See Figure [T] for 
a diagram of the operational meaning. Set valued 
arguments to W are given the natural interpretation 
as yielding the image sets of classical channels. 
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Fig. 1. Schematic representation of M = W[P,M]: A pre- 
sliared resource (randomness, entanglement or maybe sometliing 
non-physical) is portrayed by a dotted line. Alice and Bob interact 
with this, resulting in the non-signalling correlation P{x, r|g, y) £ 
C(Q— > X,y — > /?): Alice goes first, and obtains x which she in- 
puts into the channel G C(X —^Y). Then, based on the channel 
output y. Bob interacts with the coirelation resource, obtaining an 
outcome r. For example, if the resource is an entangled system, 
then w.l.o.g. both parties' interactions consist in choosing from a 
set of generalised measurements to perform on their local system. 
This all results in the channel M = W[P,J\f] e C{Q~^ R). 

Since classes of correlations are closed under local 
operations, a channel M E C{Q^R) can be exactly 
simulated by a single use of J\f E C{X Y) and 
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correlations in fl if and only if 

M e W[n{Q^X,Y^R),J\f]. 

Now, we can ask for the optimal use of one channel 
to simulate another one in the presence of some class 
of correlations fl. In this paper, we shall concentrate 
on the simulation of perfect (i.e. identity) channels 
by noisy ones ("zero-error capacity") and the reverse 
("exact simulation cost"). 

Definition 1. For a classical channel M G C(X— > 
Y), and free correlations from the class Vl, let c^{M) 
denote the maximum alphabet size c such that one 
symbol from the alphabet can be sent without error 
using n and a single use of M: 

max{c : id^ € W[n{\c\^ X,Y ^[c]),J\f]}, 

where idc is the classical identity channel on c 
symbols. 

Since clearly cJ(7Vi (g) Af2) > c^{Afi)c^(Af2), 
Fekete's lemma guarantees existence of the f2- 
assisted zero-error capacity of a channel Af defined 
by 

Cf(Af) := lim -logc[^(AA^"). 

n— i-oo n 

In this paper "log" is base 2 so this is the capacity 
in bits. We use "In" for the natural logarithm. 

Definition 2. For a classical channel M e C(X— >■ 
Y), and free correlations in class VI, let k^{J\f) 
denote the minimum alphabet size k such that perfect 
transmission of one symbol of the alphabet allows 
exact simulation of one use of the channel: 



min{A: :7Ve W[n{X ^[k],[k]^Y)Mk)]}- 
Similarly, we define 

K^{N) lim -logk^iU®'') 

as the asymptotic rate at which perfect classical 
bits must be transmitted to perfectly simulate M, if 
correlations in class Q, are free. The existence of the 
limit is once more guaranteed by Fekete's lemma, 
because k\} is clearly submultiplicative: 

k2{Ul®M2)<k^{Ul)k'^{U2). 



A. Structure of the paper 

The classical reverse Shannon theorem |4| assures 
us that in a setting of asymptotically vanishing sim- 
ulation errors, all channels can reversibly simulate 
each other when shared randomness between sender 
and receiver is freely available: the rate at which Mi 
can simulate M2 being the ratio of their Shannon 
capacities, C (Afi) / C (M) ■ This remains true when 
entanglement and even more general non-signalling 
resources are shared by sender and receiver. 

The exact coding and simulation problem will be 
shown to have a much more complex structure. In the 
next section we review some of the classical theory 
of zero-error coding, discuss the correlation assisted 
zero-error quantities Cq and C^, and then show 
several separations between them for different classes 
57 of assisting correlation. The most striking results 
here are a complete solution for the non-signalling 
assisted case and the construction of channels where 
entanglement assists for the one-shot scenario (i.e. 
where Cq^ > Cg). 

In section 



IV we explore the quantities fcn. 



which we show to be all different (in general) for 
n e {NC,SR, SE, NS}; and the simulation rates 
K^, which turn out to be all the same for Q e 
{SR, SEjNS}, and indeed are given by a simple 
formula. We even find a kind of combinatorial reverse 
Shannon theorem for zero-error communcation/noisy 
channel simulation in the presence of general non- 
signalling correlations: the simulation rate minimised 
over all channels with the same pattern of zeroes 
as the matrix Af{y\x) is the same the non-signalling 
assisted zero-error capacity of J\f. 

We conclude with some open questions. 

III. Assisted zero-error capacities 

A. Local operations and shared randomness 

We start with the least powerful resources, NC 
and SR. The former simply describes arbitrary encod- 
ing and decoding maps. Shared randomness doesn't 
change anything since any value of the shared ran- 
domness will have to yield a zero-error coding if 
the randomised protocol does. For the same reason 
nothing is lost by requiring deterministic encoding 
and decoding maps, so the coding can be fully 
specified by giving a subset of input symbols to use as 
codewords. Thus we are in Shannon's original zero- 



error setting 1 1 1, and we shall write Cq 

and Co = Co^^ = C'l''. 
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The fundamental observation is that, for zero-error 
coding over a channel M G C(X— >y), two symbols 
can both be used as codewords only if they are not 
confusable, that is, only if the corresponding output 
distributions have disjoint support. Therefore, a zero- 
error code is just a set of pairwise non-confusable 
input symbols in X, and Cq(A/^) is the largest size of 
such a set. 

In general, it is not hard to see that for any of our 
resources il, only the pattern of zeroes in N{y\x) 
can affect Cq and Cq^ so that the zero/one matrix 
[A/'(2/|a;)] encodes all the relevant information. This 
motivates the introduction of the following combina- 
torial representations of channels. 

Definition 3. The hypergraph H{M) of a channel 
J\f G C(X— >y) has vertex set X and hyperedges 

E{H{N)) := {ey {x : AA(y|x) > 0} : Vj/ G F} 

capturing the equivocation of each output symbol y G 
Y. 

Note that different output symbols can give rise to 
the same hyperedge, so that the number of hyperedges 
may be less than the number of output symbols. 

Looking back at Definition [T| let P{z\z,y; x) 
denote the probability distribution on Bob's output 
from the correlation conditional on Alice having input 
z, Bob having input y, and Alice having obtained 
output X. This is not well defined if z never occurs 
for X, and in this case we set P{z\z,y; x) = 
(so it is not in fact a distribution). When, Bob 
obtains an output y he knows that there is non-zero 
probability that Alice made input x iff it belongs to 
the hyperedge Cy. The correlation P yields a zero- 
error coding iff P{z\z, y; x)P{z\z' , y;x) = for 
all X G By whenever z ^ z' for every hyperedge 
Cy in E{H{J\f)). Therefore, a correlation assisted 
zero-error capacities depends only on the channel 
hypergraph. 

To compute the unassisted zero-error capacity an 
even coarser representation of the channel will suf- 
fice: 

Definition 4. The confusabihty graph G{Af) of a 
channel Af G C{X — >■ Y) has vertices X and an 
edge between input symbols x and x' iff they are 
confusable, i.e. ^^y^zy ■^{y\^)-^{yW) > 0- 

With this notation, Cq(7V) is simply a(G(7V)); the 
independence number of G(M). 



Clearly G(7V) can be obtained from H{J\f) by 
taking the vertex set of H as the vertex set of G and 
joining vertices with an edge iff there is a hyperedge 
of H containing both. On the other hand, given 
a graph G with vertex set X, there are generally 
many hypergraphs on X which are mapped to G 
by this rule. Hypergraphs on a given vertex set form 
a lattice when ordered by inclusion of their sets of 
hyperedges. The supremum of the set of hypergraphs 
with confusability graph G is the clique hypergraph 
of G, x(.G), whose hyperedges are all of the cliques 
in G. From the point of view of zero-error coding, 
extra hyperedges can only be a bad thing, and this 
represents the worst case: For all hypergraphs H with 
a given confusabihty graph G, Cq{H) > Cg (x(G)). 

For two graphs Gi , G2 with vertex sets Xi , X2 
their strong product Gi®G2 is the graph on Xi x X2 
with an edge {(xi, a;2), (zi, 22)} iff {{xi,X2} G 
E{Gi)) A ({21,^2} € E{G2)) or (xi = X2) A 
({zi,Z2} G E{G2)) or {{xi,X2} G E{Gi)) A (zi = 
22)- In terms of confusability graphs, G{Afi (E)Af2) = 
G{Ni)®G{N2)- For two hypergraphs Hi with vertex 
sets Xi and edges Ei, (i ~ 1,2), we define the 
product Hi H2 on vertex set Xi x X2 to have 
the hyperedges {ex/:VeGi?i,/G ^2}- The 
hypergraph of a product channel is the product of the 
individual hypergraphs, and the clique hypergraph of 
a strong graph product is the product of the individual 
clique hypergraphs. 

The Shannon capacity of a graph is the asymptotic 
behaviour of the independence number of the strong 
product of n copies 

e(G) := lim ^Q!(G®"). 

The zero-error capacity of J\f is the same quantity but 
measured in bits per channel use 

Go(AA) = loge(G(AA)). 

The smallest example where the supermultiplicitvity 
of Cq(= a) is strict is the pentagon graph G5, for 
which Co(G5) = 2 but c„{Gf^) = 5. Shannon 
conjectured that 8(G5) = -\/5, which was only shown 
to true by Lovasz |19|. 

Determining whether Cq (Af) is greater than a given 
integer k is NP-complete (indeed it is trivially equiv- 
alent to fc-CLIQUE). Whether 8 is larger than some 
number is not even known to be decidable. 

Shannon IT] found an upper bound on the zero- 
error capacity by considering feedback assistance. In 
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this scenario, as soon as Bob receives an output y 
from the channel, AHce gets to know this y with 
perfect rehabihty. While this is no advantage if only 
a single use of the channel is made, it is sometimes 
useful given multiple uses (an observation Shannon 
attributes to Elias 11]). Shannon goes on to give a 
general formula for the asymptotic feedback assisted 
zero-error capacity Cqfb- It is zero whenever Cq{N) 
is zero (i.e. whenever the confusability graph of the 
channel is complete) but otherwise is precisely the 
fractional packing number of the channel hypergraph. 

Definition 5. A fractional packing of a hypergraph 
H with vertex set V{H) = X is an assignment of 
non-negative weights v(x) < 1 to all vertices x such 
that 

WeeE{H) ^u(x)<l. 

A fractional covering of a hypergraph H with vertex 
set V{H) = X is an assignment of non-negative 
weights w{e) < 1 to all hyperedges e £ E(H) such 
that 

yxeX ^w(e)>l. 

(For weights in {0, 1} we recover the combinatorial 
notions of packing and covering.) 

The fractional packing number a*{H) is the maxi- 
mum total weight allowed in fractional packing of H 
and the fractional covering number u* (H) is the min- 
imum total weight required for a fractional covering 
of H. These are clearly dual linear programs, which 
for a channel hypergraph H{M) have the formulation 

a* {H{N)) = max | ^ v{x) : Vx € X, v{x) > 0, 
uj*{H{Af)) = mini ^ w{x) : Vj/ e Y,wiy) > 0, 

x<£X ' 

Note that the possibility of redundant hyperedges 
in this representation (as compared with the purer one 
in terms of sets) has no effect on either quantity. 

The fractional packing problem is always feasible. 
On the other hand, the fractional covering problem 
is feasible if and only if the union of all hyperedges 



covers X. Where both are feasible a* (H) = cj* {H) 
by the strong duality theorem for linear programs. In 
particular, this holds for a channel hypergraph, since 
the fractional covering problem is always feasible (as 
every input symbol always results in some output 
symbol occurring). 

From the definition of a*, we have a{G) < 
a*(G') := a*{x{G)) < Q*{H) for any hypergraph 
with confusability graph G. But it turns out that a* 
is even an upper bound on 9(G): 

Proposition 6. a* is multiplicative with respect to 
the direct hypergraph product: a* {Hi ® H2) = 

a*iHi)a*{H2). 

Proof To show multiplicativity, the strong dual- 
ity means that it suffices to show supermultiplicativity 
of a*, and submultiplicativity of u*, i.e. 

a*{Hi®H2) > a*{Hi)a*{H2) 
uj*{Hi®H2)<uj*{Hi)uo*{H2). 

These are easy, because it is straightforward to con- 
firm that the tensor product of two feasible vectors 
vi and V2 (dual feasible vectors wi and W2) for Hi 
and H2, respectively, is feasible (dual feasible) for 
Hi X H2. ■ 
Therefore, for any integer n, a{G^") < 
a*(G»") = a*(x(G)«") = (a*(G))", so e(G) < 
a* (G). But this bound is often not tight. For example 
for the pentagon G5, it yields 0(G5) < |; the above 
two-copy consideration shows on the other hand that 
©(Gs) > V5. The celebrated result of Lovasz lfT9l 
says that the lower bound is tight, 9(G5) — a/S- He 
proved this by introducing another, tighter, but still 
multipHcative relaxation for a(G), denoted d{G). 

B. Assistance by non-signalling correlations 

Now that we have reviewed the state of the art re- 
garding Cq and Gq, we go on to present our complete 
solution for Cq^ and C^^. 

Tlieorem 7. For a classical channel M e C(X— > S) 
with hypergraph H(N) 

cTm^Va*{H{M))\ 

where a*{H{Af)) is the fractional packing number of 
H{M). Being a linear program, this can be efficiently 
computed from the channel. 
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Furthermore, since a* is multiplicative, the NS- 
assisted zero-error capacity of a channel is 

C^H^)=^oga*{H{N)). 

Proof: J\f E C{X ^ Y) can exactly simulate 
a (7-message identity channel with non-signalling 
correlations if and only if there exists P in NS([.g] ^• 
XX-^ig]) such that 

P{x,z\z,ymy\x) = \l ^[ ! ^ ^' 

Without loss of generality, we can assume that a 
simplified form of non-signalling correlation is used: 
Suppose some P satisfies the above condition. Then 
the symmetry of the identity channel under simulta- 
neous permutation of the input and output alphabets 
means that we can always construct a new P' which 
is symmetrised by the following 'twirling' procedure 

P'{x,z\z,y) ^ P{x,TT{z)\TT{z),y) 



where Sg is the symmetric group of order g and 
tt{z) is the image of z under the permutation tt. This 
clearly simulates the same channel as P, but it is 
highly symmetric in that 

l^Qxy it Z^Z. 

With this simplification in mind, we maximize g 
such that a valid non-signalling correlation P' allows 
the simulation of a g-message identity channel. We 
enumerate the constraints on P' in terms of D and 
Q. 

(1) P' is a valid conditional probability distribution 
iff 

Va;, y : D^y > 0, Q^y > 



and 



Vy : Y (D^v + {g - l)Q.y) = 1- 



(2) The non-signalling condition from Bob to Alice 
is given by: 

Vy : Dxy + {g - l)Qxy = 

for some Ux, whereas the condition that Alice cannot 
signal to Bob is 

x£X x£X 



(3) The resulting channel is the 5-message identity 
iff 

Y DxyM{y\x) = 1 and 

xGX,yeY 

Eliminating D using condition (2), the full set of 
constraints (in terms of Q and u) can be simplified: 

yx,y ■.Qxy>0, Ux>ig-l)Qxy, 

xGX ^ xex 

and Y QxyN'{y\x) =Q . 

xex,yeY 

CQ^{J\f) is the largest integer smaller than the 
largest real number g satisfying these constraints, 
which we now show is the a* {H{J\f)) of the theorem. 
Defining Txy := {g—l)Qxy, the largest feasible value 
of g is 



g = max j -.Y^xy^ s, Txy > 0, Uy > Txy, 

* xex 

Yux^l, J2 TxyJ^{y\x)^Q\. 

x£X ^^v ,.^v J 



xex.yeY 



By a simple application of the linear-fractional pro- 
gramming technique |6| this optimisation can be 
recast as a linear program: Making the substitutions 

t := 1/(1 - s), T^^ :=: tTxy, v{x) = tUx yields 



max { t : v{x) > T'^y > 0, 



xex 



xGX 



YT'xyM{y\x)^Q 

xex.yeY 

This is equivalent to the linear program 

g = max \ ^ v{x) : v{x) > T'^y > 0, 
xex 

Yiyi^)~ny)<i, 



xex 



xex,yeY 
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The would just be a reorganisation, except that we 
have also replaced the equality constraints on the 
second line with inequalities. This doesn't change the 
value of the linear program: In any optimal solution 
with the inequalities, the sum over v{x) will be at 
least one. Therefore, lowering the values of the T^y, 
a solution to the LP where the equalities hold can be 
found which has the same objective value. 

Finally, note that the T^y are redundant in the 
above formulation. Indeed, we may always set T!^y = 
v{x) unless we are forced to take T^y = due to 
Af(y\x) > or, equivalently, due to [7V(y|x)] = 1. 
Therefore, 
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xex 



v{x) > Va; e X, 

X 

precisely the fractional packing number of H{M). ■ 

Corollary S. If a channel M with n inputs has at 
most m non-zero entries Af{y\x) for each y (i.e. the 
hyperedges of the equivocation graph are all of size 
< m). Then, 



NS 



(A^)> 



and C^^iAf) > log 



The proof is by checking that the assignment v{x) — 
^ is feasible. □ 

We now show that C^^ can be arbitrarily larger 
than Cg. In fact, there are channels for which the 
latter is while the former is positive! 

Let denote the set of all size-m subsets of 
[n]. For all n > m > 2, define the channels Sn,m G 
C([n] — > each maps x e [n] to a random subset 

of [n] of cardinality m containing x, i.e. 







if X <^y, 
if x ^ y. 



For all these channels, CQ{Sn,m) = C'o(5„^m) = 
because any two inputs x and x' are contained in 
a common set, hence they are confusable. (In other 
words, the confusability graph is the complete graph 
Kn-) On the other hand, by Corollary [8] all of 
these channels have CQ^{Sn,m) > log—, a strictly 
positive non-signalling-assisted capacity. 

The smallest parameters for which this effect can 
be seen on the single-shot level are n = 4 and m = 2: 
54^2 is a channel with 4 inputs and 6 outputs, and 



Theorem I?] gives Co^(54_2) = 1- How can this be? 
Define a non-signalling correlation P E NS({0, 1}— > 
[4], CI) — >{0, 1}) as follows. Alice's input is a bit z, 
her output x' is a random element of [4]. Bob's input 
IS a subset y' e {^^^). If x' E y' then Bob's output 
bit z is z and otherwise is ^z. Clearly, Bob's output 
is independent of Alice's input and vice versa so it 
is indeed non-signalling. 

Suppose Alice wires her output into the channel 
^4.2 (so x' — x) and Bob uses the output of ^4 2 
as his input to P (so y' ~ y). The behaviour of 
the channel ensures that y' will always contain x' 
and therefore Bob's output z will always be equal 
to z. A bit is transmitted from Alice to Bob with 
perfect reliability — and that despite the fact that 
any two inputs of the channel cannot be told apart 
with certainty by Bob! 

Whenever Cf^{N) > 0, the non-signalling assisted 
zero-error capacity C^^{J^) is precisely the same 
as the feedback assisted zero-error capacity. This 
is especially remarkable because the corresponding 
quantities for a finite number of channel uses are not 
necessarily the same, and the proofs of the capacity 
formulas are very different [1]. Also interesting is the 
fact that non-signalling proves strictly more powerful 
than feedback. In fact, when the capacities differ, the 
feedback assisted capacity must be zero. 



C. Assistance by entanglement 

In ifTSl we show that, like Cg(A/^) and unlike 
c^^(A^), the one-shot (and hence also, asymptotic) 
entanglement assisted zero-error capacity depends 
only on G{J\f). An immediate corollary of this is that 
if c„(AA) 
appendix. 



then 



„SE 
Co 



0. Proposition 30 



of the 



shows that these facts hold for assistance 
by any class of correlations with a certain operational 
property, which is possessed by SE but not NS. Here, 
for the reader's convenience we repeat the proof of 

ma. 

Theorem 9. For any channel Af with inputs X and 
outputs Y, CQ^{J\f) = maxc subject to the constraint 
that there exists a density matrix ps and positive 
semidefinite operators l3x for all z E [c], x E X, on 
some Hilbert space such that, 



xex 



PB 



Vz ^ z',{x,x'} E E{G{U)) : Tr /3(^)/3^f'^ = 0. 
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Consequently, cseIA/") depends only on G{M). 

Proof: We call the shared entangled state pab- 
Without loss of generality, to send message z, Al- 
ice performs a measurement with POVM elements 
{Mi"''' : X e X}, and with probabiUty 
Tt[Mx (TrB Pab)] ), obtains outcome x. Condi- 
tional on the knowledge of z and x, the residual state 
of Bob's system is pi^^ = (Tr^ mI^^ (g) ipAB)/Px^- 
Letting /J^^^ := p"x^ p^x \ for all messages z 

(i'i'' = Tr^ PAB Pb 

reflecting the fact that without information from the 
classical channel. Bob has no idea which message 
Alice sent (i.e. causality is respected). Conversely, 
any set of positive operators /^i^-* which satisfy this 
condition for some ps can be realised by a suitable 
choice of pab and generalised measurements. 

Alice puts the outcome x into the channel M. Bob 
obtains the channel output y, in addition to a quantum 
state left in his half of the entangled system. This 
bipartite state on Bob's side is given by: 

a, := ^f{v\x)\y){y\®lii'^- 

x£X,y£Y 

The encoding works if and only if Bob can distin- 
guish perfectly between all the ct^, i.e. for all distinct 

z, z' e [c] 

= Tr CTzCTz' 
= J2 AA(y|:r)W|x')<5,,'Tr/3(^)/3(?') 

x,x':{x,x'}&E{G) V 

m 

Entanglement can still help, though. As shown 
in Theorem [T3] (and previously in |15l) there are 
channels withco^(7V) > Cq{JV) > 0. 

Whether there are channels Af exhibiting an 
asymptotic separation Co^(A/') > Co(A/') remains an 
open question at this time. The efficiently computable 
formulae for c^^ and Cq^ derived in the previous 
section provide upper bounds on entanglement as- 
sistance in both the one shot and asymptotic cases, 
but a tighter bound is known: Duan et al. ifTTl have 
defined a generalisation of the Lovasz theta func- 
tion IJPl for quantum channels. It is multiplicative 
for tensor products of channels and reduces to the 



classical Lovasz theta function when the channel is 
classical. They show that this function is an upper 
bound on the entanglement assisted one-shot zero- 
error capacity (for sending classical messages) for 
any quantum channel. Therefore, the classical theta 
function is an upper bound on Cq^ for classical 
channels. A short and direct proof of this fact was 
derived independently by Beigi |16|. The bound is a 
strict improvement over the fractional packing bound 
and since d, like a*, is multiplicative, it too can be 
immediately applied to the asymptotic rate: 

Ct''{N)<\ogd{G{N)). 

In terms of trying to decide whether separations 
exist between Cq^ and Cq, this is a rather frustrating 
result because d{G{N) is typically also the best 
bound we have on Cg! Exceptions to this have been 
found, by Haemers ifTTI . but then the problem is to 
determine whether entanglement assisted protocols 
exist which beat the best known upper bound on Cq 
for those special cases, and even then, only a positive 
answer would settle the general problem. 

Another intriguing corollary of the result is that 
for the channels with Cg < Cg^ made according to 
our construction from |fT5|, the Lovasz theta func- 
tion coincides exactly with the lower bound on Cg^ 
provided by the explicit protocol we give, so for 
these channels we know the precise value of C^'^(7V) 
and furthermore that it is achieved by repeating the 
optimal protocol for a single use of the channel. 

We now review the proof of the above statement as 
well as the construction from 1 15| to which it applies. 

Definition 10. Let G be a graph with vertex set X. 
An orthonormal representation T of G in is an 
assignment of unit vectors in to the vertices of 
G such that if two vertices connected by an edge 
then their assigned vectors are orthogonal ( where or- 
thogonality is with respect to the usual inner product 
{:■))■■ 

yx,x' e X : {r{x),T{x')) <^ {x,x'}eE{G). 

Tlieorem 11. Suppose that G is a graph with an 
orthonormal representation in whose vertices can 
be partitioned into exactly q cliques {ICi,...ICq} 
each of size d. Then there is a one-shot zero-error 
communication protocol assisted by a rank-d maxi- 
mally entangled state, which shows that C^{G) > q. 
Also, ^{G) = q, and since Iil6]l , M7]l proved that 
cl^{G) < z9(G). Therefore, c^^{G) = q. 
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Proof: First, we describe the entanglement as- 
sisted protocol. Alice and Bob share ^ \j)A® 
Ij)^, with \i) the computational basis vectors for 
each party. The q cliques of size d which partition 
the vertices of the graph correspond to q complete 
orthonormal bases for given by Bz — {{T{x) : 
Vx € /Cz} for z = 1 to q. To encode the message z, 
Alice measures her half of the shared state along the 
basis B'l (obtained by conjugating each state in Bz)- 
If the outcome corresponds to x, Bob's subsystem is 
left in the state r(x). [] 

Alice inputs x to the channel. Bob's output y from 
M tells him a clique Cy in G that contains x, which 
is not necessarily one of cliques in the partition. So 
Bob's subsystem must be in one of the corresponding 
set of mutually orthogonal states T{ey). Therefore, 
he can perform a projective measurement on his 
subsystem to determine exactly which state he has, 
from which he can deduce x and, a fortiori, the 
symbol z € [q\ which Alice chose, with certainty. 

Second, to obtain 'd{G), note that it can only 
increase if edges are removed, and it is multiplicative 
under strong graph product we have 

d{G) < d{Kq ® Kd) - d{Kq)d{Kd) = q, 

where Kn and Kn are the complete and empty graphs 
on n, which have Lovasz theta values of 1 and n, 
respectively. 

Using the result from He), lHH that cf^iG) < 
i9{G), and putting both parts together, 

cf (G) = z9(G) = q. 



Definition 12. We call a set Z = {_B„i}^^^j of q 
complete orthogonal bases B„i — {\bmj) '■ j = 
1, . . . ,d} for a KS basis set, if it is impossible 
to pick one vector from each basis so that no two are 
orthogonal. 

That such sets exist is a simple corollary of the 
Kochen-Specker theorem ifTJl . An example of a KS 
basis set with 6 bases for taken from a proof of 
the Kochen-Specker theorem by Peres lfT4l is given 

in ma. 

Tlieorem 13. For any KS basis set Z = {Bm}m=i in 
of q bases, one can construct a classical channel 

'if T{x) = ^^ciili), then, the postmeasurement state is 

(E, a,{i\A ® Ib) T,Ui ® \j)B = Ej %li>s = r{x)g. 



Mz {with qd input symbols) with ca{Afz) < q and 
cse(A/'z) = q- 

Proof: Construct the graph Gz on [q] x [d] with 
{m,j) connected to {m',j') iff \b„ij) and 
are orthogonal. Clearly, Gz partitions into q cliques 
corresponding to the q bases in Z, so a{G) < q, 
and if there was an independent set in Gz of size q, 
it would have to have one element in each of the q 
cliques. But this would correspond to a selection of 
one vector in each basis in Z such that no two are 
orthogonal, in contradiction to the fact that Z is a KS 
basis set. 

Letting JVz be a channel with confusability graph 
Gz, we have just shown that Cq(7V^) < q. On 
the other hand, since T{{m,j)) :— \bjnj) clearly 
defines an orthonormal representation of Gz in C^, 
tells us that c§^(7Vz) 



Theorem 
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q (and that this 
can be achieved using a rank-c? maximally entangled 
state). ■ 

IV. Exactly simulating noisy channels 

WITH PERFECT COMMUNICATION 

This section concerns the "reverse" problem to the 
zero-error channel coding problem: How much zero- 
error communication is required to exactly simulate a 
noisy channel. It will turn out that the one-shot com- 
munication cost can differ wildly between availability 
of no correlation, shared randomness, entanglement 
and non-signalling resources. However, in the many- 
copy limit they all turn out to give the same rate, 
as long as shared randomness is available. Under a 
relaxed (namely: combinatorial) notion of channel 
simulation, we find complete reversibility between 
non-signalling assisted zero-error channel coding and 
channel simulation. 

A. Without any assistance 

What does it mean to simulate a channel Af G 
C{X — > Y)7 With a k symbol identity channel and 
no other correlations between sender and receiver the 
most general protocol is simply this: Alice applies a 
local channel Q E C{X ^[k]) and sends to Bob the 
result, on which he applies a channel TZ £ C([fc] -> 
Y). The composition should be the desired channel 
Af^TZoQ. 

Tlieorem 14. For a channel U e C{X ^ Y), 
fcp (M) equals the positive-rank / 72^ of the transition 
probability matrix J\f{y\x), i.e. the smallest number 
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k of probability distributions on Y such that their 
convex hull contains all of the output distributions 
Mi-\x). 

Since positive-rank is lower bounded by linear 
rank, we get the following lower bounds: 

koiAf) > rank TV, KoiAf) > log rank TV, 

the latter because the rank is multiplicative. □ 

For instance, the channel TVnot G ^ [n]) 

with TVnotIj/I^;) = if y = cc and if y ^ x, 
will have -S'oITVnot) — logn, the same as the perfect 
channel, even though both its Shannon and zero-error 
capacities are much lower. 

B. With shared randomness 

The set of channels one can perfectly simulate 
by sending one of k symbols when arbitrary shared 
randomness is available is simply the convex hull of 
the set (just described) that can be achieved without 
shared randomness: 



words, TV is a convex combination of deterministic 
channels in C{A^ Z), for fc-element subsets Z CY. 



C. Non-signalling correlations 

Just as in the case of zero-error communication, 
making non-signalling correlations freely available 
gives a very tractable structure to the problem of 
perfectly simulating noisy channels and the one-shot 
communication cost fc^^(TV) has a correspondingly 
simple form: It is the smallest integer greater than 
or equal to a certain simple norm of the conditional 
probability matrix. This norm is multiplicative under 
tensor products, so the corresponding asymptotic rate 
Kq^ (TV) is just its logarithm. 

Theorem 16. For a channel JV e C{X-^Y), 



k^HAf) 



y 



and 



cTV(y|x) 



Theorem 15. For a channel N € C(X ^ Y), ,y Miv\x)y' ''(note that this function is 



fcQ^(A/') is the minimum integer k such that 



A/" e conv I y C{X- 

yZ<ZY, \Z\<k 



a norm on stochastic matrices), the corresponding 
asymptotic rate is just 



'Z) 



where we view C(X — > Z) naturally as a subset of 
C{X ^Y). In fact, on the right hand side, we may 
replace the sets C(X— >Z) with their corresponding 
subsets of deterministic channels. As matrices, these 
are zero/one stochastic matrices, with rank < k and a 
channel TV has kf^^{M) < k iff its matrix is a convex 
combination of these. 

Proof: Any protocol to simulate TV exactly us- 
ing shared randomness and k messages amounts to 
writing TV as a convex (probability) combination of 
product channels, 

i 

with Q, e C{X^[k]), n, e C([fc]^y), andp^ > 
0,J2iPi — 1- Since the extreme points in the set of 
channels are the deterministic channels, we may push 
the randomness involved in forming a stochastic map 
into the shared randomness. But since Qi has only 
k outputs symbols and TZi is deterministic, also the 
composition TZi ° Qi can have only k possible output 
symbols forming a subset Z C Y, \Z\ < k. In other 



V V " 



maxTV(j/|a:) 



Proof If TV is in C{X Y), it can be 
simulated with a fc-input identity channel and non- 
signalling correlations if and only if there exists P in 
NS(X^[A:],[fc]^y) such that 



^P(z,y|a;,i) = AA(tj|a;). 



(1) 



Again the twirling procedure in the proof of Theo- 
rem |7] simplifies things, but now the symmetry is in 
the identity channel used for the simulation. Defining 



P'{z,y\x,z) ^ 



where Sk is the symmetric group of order k and tt{z) 
is the image of z under the permutation tt yields a 
correlation which simulates the same channel when 
used in place of P (summing over 7r(z) — tt{z) is 
the same as summing over z = z), but where 

I Qy^ if Z ^ Z. 
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We now list the conditions on P' (in terms of D then 
and Q): 

(1) The correctness of the simulation is given by I{Rx ■ Ry) = A/" {y\x)p{x) log 
Eq. Q: 

Dy^^N{y\x)/k. < 



(2a) The conditions for no signalling from Alice to 
Bob are 

J2 P'iz,y\=c.^) = P'iz,y\x',z) 

ze[k] ze[fc] 

for all X, x', which reduce to 



SO we write Dy^ + (fc — l)Qyx 



Clearly we 



require that J^y'^y ~ 1 ™d Uy > (in fact, 



Uy IS 



just the marginal distribution of the output y which 
is independent of both inputs, like in the PR box). 
(2b) The conditions for no signalling from Bob to 
AHce, 



Z , X 



y\x,z) 



^P'(z,y|x,z'), 



which reduce to J^y Dyx = Tly Qyx^x, are already 
ensured by the condition that Dy^ = N{y\x)/k and 
Tliy P^yx + — i)Qyx = 1 for all x (these mean that 
Y.y Dyx = Y.y Qyx = 1/fc for all x). 
(3) The only other constraint is that the entries of Q 
are positive. 

Putting these constraints together, we see that a 
suitable P' (and hence P) exists if and only if there 
is a probability vector u such that the resulting Q 
matrix has positive entries, i.e. 

Uy -N(y\x)/k > 

for all y, x. Such a u is possible if and only if 

EmaxA/'(y|a::)/fc<l. ■ 
x 

y 

Remark 17. It is not hard to verify directly that the 
bit rate needed to perfectly simulate Af with free non- 
signalling correlations is greater than the Shannon 
capacity of Af. If the channel input is the random 
variable where Pr^R^ = x) = p{x) and the 
resulting channel output is the random variable Ry 



< 



\ogj2 

x,y 



U{y\x)'^p{x) 



E.my\z)p{z) 



Af{y\x)p{x) maxr A^(j/|r) 



x,y 



log ^ max A/'(y|r). 



The Shannon capacity of Af is obtained by maximis- 
ing the left-hand side over all input distributions p. 

D. Arbitrarily large gap between fcg, /cq^ (A^), and 

Shared randomness is one type of non-signalling 
con-elation so it is clear that k^^{Af) > k^^{Af). It 
turns out that there can be an arbitrarily large gap 
between these two costs. This is the case for the 
"universal channels" to be defined below. 

Definition 18. Recall that the set of all size-m sub- 
sets of [n] is denoted by ('j^')- The universal channel 
^n,m is the channel in C(('^"l) — >■ [n]) with. 



ifyex, 
ify^x. 



In words, the channel takes as input a set x G ('j^') 
and outputs a random element of that set. 

Note that A/not introduced earlier in this section 
is a special case of the universal channel with m = 

n — 1. 

The universal channels have a great deal of sym- 
metry. The symmetric group Sn acts on both the 
input and the output alphabet of Un,m'. on the latter 
naturally as permutations of the symbols (written 
TT{y)), on the former as simultaneous permutations 
of all elements in the sets x C [n] (written x^). With 
these actions, Un,m is S'„-covariant: 



yy,X Un,m{y\x) ^Un,miT^{y)\x'^)- 



(2) 



A beautiful consequence of the covariance is that 
it specifies W„.,„ almost uniquely: Un^m is the 
only channel satisfying eq. (|2| and in addition 

Un,m{y\x) = if y ^ a;. 
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To simulate Un.m with zero error when assisted 
by arbitrary non-signalling correlations, Theorem [16] 
shows one needs a noiseless channel of 



"-0 \y^n,m) 



Exn&yi N {y\x) 



many symbols, and this is sufficient. The minimal 
asymptotic rate of communication needed given free 
non-signalling correlations is log ^ . On the other 
hand, when only shared randomness is available, the 
communication cost can be much higher: 

Proposition 19. For any n > m > 1, 

ko^iUn,ni) =n-m + l. 

Proof: We first show that k — n — to is not 
sufficient by contradiction. Recall Theorem [15] and 
consider an element JV from C(X— > Z), with Z C 
Y, \Z\ = n ~ m in the convex decomposition of 
Un,m that comes with a strictly positive weight p. 
That means, for any input x, 

pN{-\x) <Un,,n{-\x) 

in the sense of element-wise ordering of the proba- 
bility vectors. Choosing x = [n] \ Z — which has 
cardinality to — leads to the desired contradiction: 
restricted to Z, Un^m{'\x) is the zero vector (see the 
definition), whereas M{-\x) has all of its probability 
mass in Z. 

On the other hand, there is a protocol that uses 
only n — m + 1 messages: The shared randomness 
is a uniformly distributed subset T e („J"]_|_i)- On 
input X, Alice selects a uniformly random element 
y £ X T, which is non-empty by the pigeonhole 
principle. To send y to Bob, she needs only a number 
from 1 through n — m + 1 to specify where y 
occurs in T. Clearly, this protocol, and hence the 
simulated channel, has the same ^n-covariance as 
Un.m- Furthermore, the simulated channel assigns 
zero conditional probability to all y ^ a for input 
X. Thus, the simulated channel must be Un.m- ■ 

These universal channels provide simple and 
highly structured examples for separating fcg, fcg^, 
and /cg^- We have already seen that for m — n — 1, 



kf^iUn.m) — n but Prop. 19 says that ko^iUn.m) = 2 



In a different regime, for example when n is even and 

to = f , kl^{Un,m) = (f + 1) but k^H^n.m) - 2. 

In both cases, the separation is of order n which is 
maximal given the size of the input alphabet. 



E. Shared entanglement 

The possibility of large separations between 
k^^iN) and fcQ^(A/'), raises the question of where 
the power of the intermediate shared entanglement 
class fits in between the two. While we do not have 
a general understanding of this matter yet, we can at 
least give examples where entanglement beats shared 
randomness and cases where general non-signalling 
correlations can beat entanglement. 

Let 7p denote the ternary erasure channel with 
transmission probability p: 



\ 



p 

p 

p 

\ l-p l-p l~p J 



We can use the ideas of the appendix to show that 
non-signalling correlations can beat shared entangle- 
ment for one-shot simulation of 71/2. 

Proposition 20. Whereas = 2, 

fcQ^(7i/2) = 3. Therefore, using a single (perfect) 
bit of communication, strictly more channels can be 
simulated if generalised non-signalling correlations 
are available rather than entanglement. 

Proof: By using the twirling procedure of The- 



16 (which can be assumed w.l.o.g. whenever 



shared randomness is available), and considering the 
simplified non-signalling constraints which result, it 
is not hard to see that exact simulation of T1/2 with 
a single bit is equivalent to the ability to realise a 
particular non-signalling correlation P* defined, in 



the notation of Theorem 16 by fc = 2 



— ^ 



/ 1 \ 

1 

1 

V 1 1 1 y 



and Qyx = 1/4 — Dyx- This P* must be in the class 
of available correlations. Applying the observations 
of the appendix, and looking at the 8 conditional 
channels P*^, one finds that they are all pair-wise 
distinguishable and therefore P* cannot be in the 



class SE, by Proposition 31 



Proposition 21. Strictly more classical channels can 
be simulated using shared entanglement than can 
with shared randomness. In particular, there are 
channels Af for which k^^{Af) = 2 but fcg" (W) > 3. 
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(0,0) 


(0,1) 


(1,0) 


(1,1) 


(0,0) 


/ 1/2 


1/2 


1/2 





(0,1) 











1/2 


(1,0) 











1/2 


(1,1) 


^ 1/2 


1/2 


1/2 






Proof: We construct these channels and demon- 
strate the separation in three steps. First, we show 
that 7i/2 can be simulated using one bit of com- 
munication and a certain non-signalling correlation 
called the "PR-box". Then, using the same protocol, 
but replacing the PR-box by a weaker correlation 
obtainable via shared entanglement, we write down 
the channel M that is being simulated. Finally, we 
explain why M cannot be simulated with shared 
randomness and one bit of communication. 

The PR-Box (introduced by Popescu and Rohrlich 
1201) is a particular correlation P(s, t\a, b) given by: 



/ 

In other words, the outputs s, t are random bits except 
for the constraint s (B t — a ■ b. We let z to be the 
message sent via the classical channel. 

We first give an explicit method for simulating 71/2 
using a bit of communication and a single use of a 
PR box. Bob always chooses his PR-box input b to 
be the channel output z and outputs (z, t) for the 
simulation. Alice has a message x e {0, 1, 2}. If x = 
0, she sets her PR-box input to be a = 0. The PR-box 
outputs a random bit s. She sends z ~ s to Bob, who 
puts it in the PR-box and obtains t = s. Thus, Bob 
outputs (0, 0) or (1, 1) randomly. If a; = 1, Alice sets 
a — 1 and her PR-box outputs a random bit s. She 
chooses z — s. For either value of s, t = 0. Thus, 
Bob outputs (0, 0) or (1, 0) randomly. If a; = 2, Alice 
sets a = and z = 0. Thus, Bob outputs (0, 0) or 
(0, 1) randomly. Identifying the outputs (0,0), (1, 1), 
(1, 0) and (0, 1) with the erasure symbol E, 0, 1, and 
2 respectively, the above simulates 7i/2 perfectly. 

In the second step, we generalize the PR-Box to 
correlations Px given by: 





(0,0) 


(0,1) 


(1,0) 


(1,1) 


(0,0) 


/ A/2 


A/2 


A/2 


A/2 \ 


(0,1) 


A/2 


A/2 


A/2 


A/2 


(1,0) 


A/2 


A/2 


A/2 


A/2 


(1,1) 


^ A/2 


A/2 


A/2 


A/2 J 



where A = 1 - A. Note that Pi is the PR-Box. 
The PR box can be approximated using a maximally 
entangled pair of qubits. An optimal approximation, 
in terms of the CHSH violation, yields the corre- 
lation Pa with A = (1 + l/\/2)/2 w 0.85. If this 



entanglement based approximation of the PR box is 
substituted into the protocol given above, the resulting 
channel J\f is given by 








1 


2 


(0,0) 


fa 


a 


l/2\ 


0^ (1,1) 


a 


/3 





1^ (1,0) 




a 





2^ (0,1) 




/3 


1/2/ 



where a = (l + l/\/2)/4 w 0.43 and /3 = 1/2-a w 
0.07. 

Finally, we check with a computer that M is not a 
convex combination of rank-two, zero-one, stochastic 



matrices and so, according to Theorem 15 can't be 
exactly simulated using one bit of communication if 
only shared randomness is available. ■ 



F. Asymptotic equality of correlation assisted com- 
munication costs 

Among the results of this section so far are chan- 
nels proving separations between the communication 
costs of NS-, SE- and SR-assisted channel simulation 
for a single channel use. In the case of NS vs. SR, 
the universal channels of Definition [18] show that this 
gap can be arbitrarily large. Despite this, we will 



prove in Theorem 24 that when simulating many uses 
of a channel, a protocol using shared randomness 
can achieve an asymptotic rate of communication 
as low as the optimal rate with non-signalling as- 
sistance derived in Theorem [161 Since the rate with 
entanglement assistance is sandwiched between these 
two rates, it follows that A'^^(7V) K^^{M) 
K^^{M) — log^ maxa;A/'(?/|x), for all channels 
N. 

The proof is structured into two steps. First, we 
show that the asymptotic equality discussed above 
holds for all the universal channels. Then, roughly 
speaking, any channel can be exactly simulated by a 
universal channel with the same value of K^^. It is 
this ability of the channels Un^m that earns the name 
"universal." We need a lemma for this proof: 

Lemma 22. We call a set T <Z [n]* m-touching if 

rn (xi X a;2 X • • • X Xq) ^ 0. 



Vxi,...,a;„ € 



There is an m-touching set of cardinality 
min{n9,2ng(^)«}. 
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Proof: If a set is populated by picking r = 
2nq{^Y elements of [nY picked unifonnly at ran- 
dom (with replacement), the probability that it is not 
m-touching is bounded above by 



that A/'(yi . . .yq\xi . . . Xq) = whenever 



yi---Vq 



X x„. As discussed 



fail 



< 



1 - 



m\9 



before, this means that the simulated channel must 
be A/'^«. ■ 

Theorem 24. For any channel TV G C{X^Y), 



With the simple estimates (J^) < 2" and ln(l -x)< Kf^i^M) = K^^{Af) = log ^ maxJV{y\x). 



-X, 



fm\<i 

In -Pfaii < In 2 - r ^—j 
= {\n2-2)qn < 0, 

so a set with the desired property and cardinality must 
exist. Indeed, the probability that a set chosen in the 
way described above isn't m-touching is exponen- 
tially small in qn. ■ 

Proposition 23. For any universal channel Un^m, 



Proof: By definition, K^^'{Un,m) > 
KQ^{l/(n,m), SO it suffices to exhibit a protocol 
using only shared randomness that achieves this 
bound. To be precise, for q copies of the channel, 
we prove the existence of such a protocol which 
uses the transmission of one of 



in{n^[2gn(^)''J} 



symbols. The rate is < log ^ + ^ log 2qn, which 
approaches log — as g — > oo. 

The protocol works as follows: Alice and Bob 
agree on an m-touching set T of size k (see 
Lemma [22]i. They share randomness in the form of q 
uniformly random permutations tti, . . . , tt^ e S'„. On 
input {xi, . . . ,Xq) Alice picks a uniformly random 
element (yi, .. .,yq) £ T'^^'---''^''n{xiXX2X - ■ -xxq), 
where 



{{Tri{zi),...,TTq{Zq)) ! V (zi , . . . , Z,) G T} 



is the set T with its elements permuted according 
to TTj in coordinate j = 1, . . . ,q. The intersection 
is guaranteed to exist because T'^i-^----'^i is also m- 
touching. To send y, she only needs a number from 
1 through k to specify the location within y^i. ■■ '^fc 
since the latter is known to Bob. 

This protocol evidently simulates an 
S'^'-covariant channel with the property 



Proof: First, suppose all the entries of J\f{y\a) 
are rational numbers, with common denominator M, 
so that J\f{y\x) — jjt{y\x) for integers t{y\x). Spht 
up each output symbol y into ty := maxx t{y\x) 
many, denoted {y,j), with j — 1, . . . ,ty. Now define 
a new channel by letting Af [{y, j)\x) be either 
or l/M, in such a way that Af — H o J\f with the 
projection map/channel 11 : {y,j) n- y. Clearly Af is 
a sub-channel (i.e. a restriction on the input alphabet) 
of the universal channel Un,m with N — J^y^v 
It can therefore be exactly simulated using shared 
randomness by the protocol of Proposition 23 This 
requires asymptotic communication rate 

which is precisely the lower bound set by K^^{A^), 
so the claim holds for rational Af. 

For the general case, pick a large integer M and 
let, for all X e X, y e Y, t{x\y) := [Af7V(y|x)J . 
Now adjoin new elements x' (x G X) to the output 
alphabet, i.e. define Y := Y U X' and a new channel 
Af -.X ^Y with 



J^iy\x) 
Af{x'\x) 



M 



Now, Af can be simulated by Af using post- 
processing by Bob only: if y e F is obtained, then it 
is left alone; if x' G X' is seen, then Bob uses local 
randomness to output y with probability 

Qiy\^') = ^^my\x)-Afiy\x)). 

A/ [x'\x) 

So, extending Q to a proper channel by letting 

Q{y\y') = 1 for y' E Y iff y — y' , we have 
Af^QoAf. 
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Second, the cost of simulating M is 



^^^{x\x) 



:log ^ max7V(2:|a;) + ^ N{x'\x) 




<log 



^J\f{y\x) 



1 

M 



X\\Y\ 



oo, this rate approaches 



Letting M - 

log max, AA(y|x)) = A-oNS(_^). 

To illustrate the idea, if Afo -< Afi denotes the 
partial order on channels "Ao is equal to TZoMi o Q, 
for some channels TZ, Q", the proof uses 



4 2 \ 1 
1 3H5 



/ 1 


1 \ 


1 


1 


1 





1 








1 





1 


V 1 


1 / 



to show that iff ^ (TV) < K^^iUj^) ^ log 7/5 = 

Remark 25. The classical reverse Shannon theorem 
also yields an exact simulation of the noisy 
channel using shared randomness. The difference 
to our result is explained by the different way to 
account for the communication: whereas ^ shows 
that tlie expected rate of communication (with respect 
to the shared randomness used in the protocol) is 
the normal Shannon capacity of the channel, here 
we consider the much more stringent worst case 
communication cost. 



G. Weak simulation and reversibility 

Looking over the formulas for C^^ and K^'^ 
of a channel J\f, we notice that the former only 
depends on the channel hypergraph, while the latter 
actually involves the transition probabilities. Hence it 
is not surprising that the former is typically strictly 
smaller than the latter. However if we are content 
with the simulation of any channel that has the same 
hypergraph, we recover reversibility: 



Proposition 26. Let A/" G C(X— >y) with channel 
hypergraph H{J\f) (having hyperedges {cy : y € 
Y}). Then, 

mi{K^^{M) : H{M) = H{N)] = \oguj*{H{N)) 

where uj*{H{J\f)) is the fractional covering number 
of the hypergraph of the channel. Since the fractional 
covering number is equal to the fractional packing 
number a* (H), this minimum rate is also equal to 

Proof Recall the formula for K^^{J\f): it is the 
logarithm of the value of the following linear program 
(all variables understood as non-negative): 

mill I ^ w{y) : Wx,y w{y) > Afiy\x) 
[veY 

The additional minimisation over channels with pre- 
scribed hypergraph H is also a linear program: 

min { ^ w(y) : w{y) > Af{y\x), 
Af{y\x) = if X <^ By, 



But this is evidently equivalent to 



mm w{y) -.WxeX, 

yeY 



E 

y with CyBx 



w(y) > 1 



which is exactly the fractional covering number 
(jj*(H). For the other statements see Proposition |6] 



V. Conclusion 

Let us summarise the results (both our own, and 
others) discussed in this paper. For zero-error commu- 
nication, we found both the one-shot and asymptotic 
non-signalling assisted capacities, upper bounding the 
chains of operationally obvious inequalities: 



<ct^iJ\f)<cr{Af)^la*{HiAf))\ 



log(e(A/-)) = Co(A/-) = Cl^iU) < cl^iM) 
<Cr{M)^loga*{H{M))) 
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These upper bounds on the entanglement assisted 
capacities from non-signalHng are improved upon by 
the results of ||T6| and ifTTll which show that the 
Lovasz theta bound applies even in the entanglement 
assisted case: 

While we proved that Cq{JV) < Co^(A/') can be 
strict, we don't yet know whether the same can be 
said of the asymptotic rates, and regard this as one 
of the main open problems. 

In the reverse problem of exactly simulating noisy 
channels, the non-signalling assisted case was again 
completely soluble, providing lower bounds on the 
chain 



max J\f{y\i 



For each inequality in this chain of one-shot costs, 
a channel showing that it can be strict was exhibited. 
Some open questions remain regarding the potential 



sizes of these separations (see section IV i. 

For the asymptotic rates of communication things 
were shown to be simpler: While large gaps can exist 
between the costs with free shared randomness and 
without {Kq'~^ (Af) > log rank J\f), given free corre- 
lations from any class of non-signalling correlations 
which contains shared randomness the rates are equal: 



log \^maxM{y\x)j = K^^^) = i^o'^(A^) 
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Appendix 
Notations 

[n]: The set {!,..., n}. 

C{X ^Y): The set of classical channels with input 
alphabet X and output alphabet Y. 
C{A S,B -J> T): The set of bipaitite classical 
channels with input alphabets A (for Alice) and B 
(for Bob) and respective output alphabets S and T. 
Q,: Some class of correlations: one of NS — non- 
signalling, SE — shared entanglement, SR = shared 
randomness, NC (or ommited) = no correlation. 
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n{A S,B T): The subset of C{A S,B T) 
in the class O. 

J\f{y\x): The probabiHty that the channel J\f outputs 
symbol y when symbol x is input. 
E{G): Edges of the graph G. 
E{H): Hyperedges of the hypergraph H. 
G{Af): Confusabihty graph of the channel M. 
H{J\f): Hypergraph of the channel Af. 
x{G): The clique hypergraph of the graph G. 
a{G): The independence number of the graph G. 
a*{H): The fractional packing number of the hyper- 
graph H. 

u*{H): The fractional covering number of the hyper- 
graph H. 

Cq (A/^): One-shot zero-error capacity of J\f assisted 
by n. 

CQ{Af): Zero-error capacity of Af assisted by Q. 
fcg (A/^): One-shot simulation cost of J\f assisted by 

n. 

KQ{Af): Simulation cost of Af assisted by O. 
Appendix 

Pair- WISE versus mutual distinguishability 

FOR sets of local RESIDUAL STATES OF 
CORRELATIONS 

Definition 27. We say that two classical channels M 
and M. in C(X— are pair- wise distinguishable, 
and write J\f txi A4 if there is an input x* G X such 
that 

J2^f{y\x*)M{y\x*) = 0. 
yeY 

If Alice makes an input a to her side of a bipartite 
correlation P e C{A^ X,B —>-Y), and obtains the 
output X then the conditional distribution on Bob's 
side is a classical channel Pax{y\b) E C(B Y) 
where Pax{y\b) is simply the conditional distribution 
P{y\b, a, x) given by Bayes rules, but written differ- 
ently to emphasise the fact that we regard a and x 
as fixed. Similarly, there are such conditional 

channels Phy{x\a) on Ahce's side. 

Definition 28. We say that a class of bipartite cor- 
relations fl has property PWa if the existence of a 
correlation P(x, y\a, h) G Q.{A X,B ^ Y), and 
S C A X X, satisfying 

Pax Pa'x' V(a, x), {a', x') e S 

implies the existence of another correlation, 
P'{x,y\a,b) e n{A^ X,BU {b*} Y U Y'), 



which is identical to P when restricted to the input 
alphabets of P, 

\/a e A,x e X,b e B,y eY : 
P'{x,y\a,b) = P{x,y\a,b), 

but which has an extra input b* on Bob's side such 
that 

y{a,x),{a',x') €S: 

E PL{y\b*)PL'Ay\b*) = o. 

veYUY' 

n has property PWg if it satisfies the same condition 
with the roles of the parties reversed. If Q. has 
property PW^ and property PWb then we simply 
say it has property PW. 

In other words, if a correlation P belongs to a 
PWa class rj, and the graph induced on A x X 
by the pair-wise distinguishability relation between 
the conditional states Pax associated with the vertices 
contains a chque S, then there is another correlation 
P' in O which behaves Uke P except that Bob has 
some extra output symbols (possibly), and one new 
input symbol b* , which when input, yields pair- wise 
orthogonal output distributions on Y for all elements 
of S so that they can be perfectly distinguished 
simultaneously. 

To illustrate this idea, we now show that the class 
NS of generalised non-signalling correlations is not 
PW: If Alice and Bob's shared correlation P is the 
PR-box, then the conditional channels Pax are given 
by 

Pax{y\b) = [x(By = {a^b)]. 

These channels are all pair-wise distinguishable, but 
of course, the required input on Bob's side depends 
on the pair. If NS were PW, then the existence 
of P G NS would imply the existence of another 
correlation in NS where a single input on Bob's side 
would suffice to distinguish the 4 residual states. But 
obviously this would allow Bob to determine Alice's 
input, so this is a contradiction: Put another way, if 
a class is PW and contains the PR-box then it also 
contains signalling correlations. 
On the other hand. 

Proposition 29. The class of bipartite correlations 
which can be implemented as local measurements on 
entangled quantum states fSEJ is PW. 
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Proof: Assuming w.l.o.g. that Alice measures 
first: Alice inputs a (corresponding to her measuring 
of some POVM on her side) and obtains outcome 
X, leaving a residual state pax on Bob's side. The 
conditional channel Pax{y\b) is given by 

so if Pax !xi Pa'x' then there must be some input b on 
Bob's side, corresponding to a POVM with elements 
{By}y^Y say, such that 

'iy:Pax{y\b)Pa'x'{y\h) 

= {TrByPax) {TrByPa'x') = 

which implies that the residual states pax and pa'x' 
are orthogonal (i.e. have disjoint support). 

A clique of pair-wise distinguishable conditional 
channels on Bob's side therefore corresponds to a 
clique of mutually orthogonal residual states on his 
side. Therefore, there is a single measurement which 
perfectly distinguishes all members of the clique, 
which we can obviously use to construct a correlation 
P' in the class of correlations SE with the required 
properties. ■ 

From this result and the previous example, it is 
clear that the PR-box cannot be perfectly imple- 
mented by shared entanglement (a fact which can 
alternatively be proved by the Tsirelson bound). 

Proposition 30. If fl is a PW class of correlations, 
then the one-shot (and hence asymptotic) ^.-assisted 
zero-error capacities Cq {M) and (A/") of a chan- 
nel Af only depend on the confusability graph G{M). 

Proof: Let P be a correlation in r2([c] — 
[c]) such that the standard 'wiring' yields the largest 
possible identity channel i.e. one with c = Cq (A/^) 
symbols. 

' '^^ I otherwise. 

Write 6y{x, x') if there is a single input y on Bob's 
side such that ^ z' ,\/x : Pzx-.y -L Pz'x-.y- In words, 
Sy{x,x') means that if Bob knows that the channel 
input was one of x or x' then he can distinguish which 
z Alice chose by making input y to his side of the 
correlation. 

When Bob gets output symbol y from the channel, 
let By denote the set of possible inputs. He can decode 
z perfectly iff 5y{x,x')yx x' E Cy. If we draw a 



graph on X with edges labelled by outputs Y, with a 
y-edge between x and x' iff 5y{x, x'), then (ignoring 
edge labels and multiplicities) this graph must contain 

G{N). 

Recalling the discussion after Definition [4] we 
know that c^{x{G{J\f))) < cJ(i/(7V)). By the PW 
property of fl it must be possible to find a new 
correlation in such that if Bob knows that x was in 
any clique in this graph, then he can still determine 
z. So the il-assisted zero-error capacity of the clique 
hypergraph of G is at least c. 

Therefore, if il is PW then 

c?(x(G(AA)))=c?(F(A^)), 

so the zero-error capacity depends only on the con- 
fusability graph. ■ 

Proposition 31. For a correlation P e C(A ^• 
X,B-i'Y), let Ab (Aa) be the graph on the \A\\X\ 
(|i?||y|) conditional channels on Bob's (Alice's) side 
where edges denote pair-wise distinguishability. If 
P belongs to a class which is both PW and non- 
signalling, then 

x(Ab) > 1^1 

and 

x(Aa) > \B\ 

where x denotes the clique covering number In 
particular, these bounds apply to the correlation class 
SE.- The set of bipartite correlations which can be 
implemented using entanglement. 

Proof: Let A* (B*) be a minimal clique covering 
of (Ab)- Suppose that P is in which is PW 
and non-signalling. By repeated use of the definition 
of the PW property, ft contains a P e C(A U A* -> 
XUX',BUB*^YUY') such that for q e B* 

yyeYU Y', (a, x) e q, (a', x') e q : 

P'ax{y\<i)P'a'x'{yk) = ^- 

This means that if Alice inputs a £ A io P' and 
obtains output x, and then tell's Bob a clique in 
B* which contains (a, x). Bob can determine (a, x) 
exactly. In particular, he discovers Alice's choice of 
input in A without error, by Alice transmitting one 
of |P*| = x(A_b) messages. Since non-signalling 
correlations can't increase the zero-error capacity of 
identity channels (a simple consequence of Theo- 
rem |7|, if x(Ab) < \A\ then P' cannot be non- 
signalling (and similarly if x{^a) < \B\). ■ 



